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Executive Summary^ 



This document reports on researeh related to large-seale assessments for students with learning 
disabilities in the area of reading. As part of a proeess of making assessments more universally de- 
signed we examined the role of “readable and eomprehensible” test items (Thompson, Johnstone, 
& Thurlow, 2002). In this researeh, we used think aloud methods to better understand how inter- 
ventions to improve readability affeeted student performanee. Deereasing word eounts in items 
and making important words bold did not seem to make any differenee in student aehievement 
(although students preferred that important words were printed in bold). Voeabulary, however, 
was a notable faetor. Non-eonstruet voeabulary in both the stem and answer ehoiees of items 
eaused diffieulty for students as well as words with negative prefixes (e.g., “dis”). Implieations 
for this researeh are that readability is eorrelated with voeabulary (see Rand Reading Study 
Group, 2002) and that eonstruet and non-eonstruet voeabulary must be elearly defined in order 
to inerease aeeessibility of assessments. 
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Introduction 






Recent changes in Federal legislation (including the No Child Left Behind Act of 2001) have 
placed greater emphasis on large-scale assessment data as a measure of student learning. Be- 
cause of this focus, schools and school districts are accountable for ensuring student success 
on assessments. According to federal guidelines, all students must succeed on state-mandated 
assessments of learning, including students with disabilities. 

In order to ensure that all students, including students with disabilities, are taking assessments 
that are accessible to a wide variety of student needs, Thompson, Johnstone, and Thurlow (2002) 
recommended that a “universal design” approach be used when designing assessments. Uni- 
versal Design of Assessment (UDA) is broadly defined as assessments that are “designed and 
developed from the beginning to allow participation of the widest possible range of students, 
and to result in valid inferences about performance for all students who participate in the as- 
sessment” (Thompson et ah, 2002, p.5). 

The term universal design was originally used as an architectural concept (Center for Universal 
Design, n.d.). Philosophies of access and barrier removal were the foundations for early universal 
design approaches. As part of this design philosophy, ramps, elevators, expanded doorways, 
signs, bathrooms, and other features do not have to be added or modified at additional expense 
after the completion of a building. Rather, as part of universal design, these features are sketched 
into structures’ blueprints from the beginning. The promise of universal design is that some of 
the same architectural features that accommodate people with disabilities also benefit many 
others, including senior citizens, families with young children, and delivery people. 

Thompson et al.’s definition was also initially used to define UDA of general education assess- 
ments. The authors further explained UDA by defining seven “elements” of universal design, 
including: (1) inclusive assessment population; (2) precisely defined constructs; (3) accessible 
non-biased items; (4) amenable to accommodations; (5) simple, clear, and intuitive instructions 
and procedures; (6) maximum readability and comprehensibility; and (7) maximum legibility. 

This study builds on a series of previous studies conducted by the National Center on Educa- 
tional Outcomes. Beginning in 2002, Thompson et ah, Johnstone (2003), Johnstone, Thomp- 
son, Moen, Bolt, and Kato (2005), and Johnstone, Bottsford-Miller, and Thompson (2006) all 
have sought to operationalize the term “universal design” as it applies to assessments, and to 
understand the practical implications of universal design processes. One common theme in all 
of the studies was that language was a particularly important characteristic of items that were 
and were not accessible. 

This study focused specifically on maximum readability and comprehensibility of items in read- 
ing assessments. The study was particularly salient to ongoing research on the accessibility of 
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general education reading assessments (see http://www.narap.info) because it sought to better 
understand what particular characteristics make a test item readable and accessible. 

Readable and comprehensible language was also selected as the focus for this study because 
it is a somewhat controversial concept in the field of measurement. Tampering with how items 
are written in order to increase readability has recently been challenged because of the potential 
effects that language changes may have on the intended constructs of items (Gong & Marion, 
2003). In addition, language changes possibly make test items less authentic and reduce their 
similarity to real-world applications (Kahl, 2006). 

Much of the research on the need for readable and comprehensible language as an aspect of 
universal design comes from literature on students who are English language learners. In some 
cases, readable text is operationalized as language that is concise and efficient, allowing read- 
ers to quickly access test items. For example, Abedi, Hofstetter, Baker, and Lord (2001) found 
that English language learners often have higher proficiency in mathematics than is reflected 
on large-scale mathematics tests, and recommended that language be clarified to reflect essen- 
tial constructs. Likewise, Kopriva, Winter, Emick, and Chen (2006) noted that there are both 
target and non-target language skills required for assessment items and that by clarifying these 
language needs it is easier to create more concise language around relevant constructs. Once 
construct-relevant language is identified, the process of winnowing away superfluous language 
may improve assessment results and, more importantly, assessment validity. 

Readable and comprehensible language can be quantified on readability scales. Shaftel, Belton- 
Kocher, Glasnapp, and Poggio (2006) applied formulae to predict how readable particular items 
were on a test. The authors found that students who typically struggle with reading the English 
language (English language learners and students with disabilities) may be at risk for items 
containing pronouns, complex verbs, and irrelevant language. In such cases, determining what 
students do and do not know is complicated by item language, which at times may not allow 
English language learners and students with disabilities to demonstrate their true capabilities. 

As early as 1987, Rakow and Gee outlined a series of considerations for making test items more 
readable and comprehensible. These included: 

• Students would likely have the experiences and prior knowledge necessary to 
understand the item 

• Vocabulary is appropriate for the intended grade level 

• Sentence complexity is appropriate for the intended grade level 

• Definitions and examples are clear and understandable 
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• Required reasoning skills are appropriate for students’ cognitive level 

• Relationships are made clear through precise, logical connectives 

• Content within items is clearly organized 

• Graphs, illustrations, and other graphic aids facilitate comprehension 

• Questions are clearly framed 

The RAND Reading Study Group (2002) also stated that difficulty of text may vary due to 
“complex Boolean expressions.” Such expressions are challenging because “the respondent 
needs to keep track of different options and possibilities” (p. 96). In the case of negative ex- 
pressions, an unnecessarily high cognitive load may be added to items that employ negatives 
within items (e.g., “Which of the following is not a reason why the captain wanted to turn her 
ship around?”). 

The RAND Reading Study Group also found a strong correlation between the level of vocabu- 
lary on items and their readability. To address the danger in unnecessarily inflating the difficulty 
level of items, Haladyna and Downing (2004) suggested removing nuisance variables (such as 
challenging vocabulary that is not relevant to the construct tested) as a way of improving the 
validity of tests. Reduction in the level of non-construct vocabulary can be coupled with reduc- 
ing the density of text (i.e., text that does not “pack too many idea units” or propositions into a 
single clause) (RAND Reading Study Group, 2002, p. 96) to create readable, comprehensible 
test items. 

In summary, a variety of sources provide information on how to operationalize “readable and 
comprehensible” text. The purpose of this research was to take the cumulative information from 
these sources to create readable and comprehensible text in reading items and then to examine 
their impact on students with learning disabilities. In order to determine the qualitative impact 
of such interventions, we chose to use think aloud methods. 

According to Ericsson and Simon (1994), think aloud verbalizations (where students say all their 
thoughts aloud while they are approaching an item) are an important data source for understand- 
ing cognitive activities. The reason why think aloud data are so rich is because all cognitive 
processes travel through short-term memory and are spoken at the time they are processed. Thus, 
think aloud methods provide researchers with qualitative information about the reasons why 
students may struggle on test items (Leighton, 2004). Such data are informative for purposes of 
addressing universal design issues in large-scale assessment items (Johnstone et ah, 2006). 
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Methods 



Overview 

In this study, released National Assessment of Edueational Progress (NAEP) items were used as 
reference or “standard” items from which changes were made to create items that fit our definition 
of readable and accessible. Specifically, Grade 8 passages and their corresponding items were 
examined. Researchers at the National Center for the Improvement of Educational Assessment 
(NCIEA) developed all protocols. Each protocol contained a passage (one informational and 
one literary passage were used for this study), followed by a series of items. Eor each passage, 
standard NAEP released items and modified items designed to be readable and comprehensible 
were interspersed with one another. Items from the standard set of NAEP released items were 
reviewed by NCIEA researchers for the following characteristics. 

• Use of pronouns. A sentence with excessive pronoun use may cause a student to lose 
track of the main point of reference in an item. Eor example “Jake left his house to go 
to visit his grandmother at her house. What was the first thing she said?” 

• Use of negatives. An unnecessarily high cognitive loading may be added to items that 
employ negatives within items. 

• Vocabulary. There is strong evidence that suggests an inverse correlation between 
the level of vocabulary in text and its readability (RAND Reading Study Group, 

2002). Eor the purposes of this study, we were not concerned with vocabulary in 
passages, but with the vocabulary level within items. Unless the construct of an item 
is to test vocabulary skills, we attempted to reduce vocabulary demands as a principle 
of access. Eor example, “The author’s conclusion was a somber reminder of what?” 
might be rephrased as “The ending of the book was a sad reminder of what?” 

• Non-construct subject area language (specialized vocabulary). English language 
arts has a specialized vocabulary of its own (e.g., characterization, denouement, prose, 
iambic pentameter, etc.). When these words are part of the intended construct, it is 
appropriate to include them in items. When these terms are extraneous to the intended 
construct, they may introduce “nuisance variables” (Haladyna & Downing, 2004). We 
attempted to remove any non-construct subject area language from items. 

• Complex sentences, dense text. The RAND Reading Study Group (2002) defined 
sentences with “dense clauses” as sentences that “pack too many constituents or idea 
units (i.e., propositions) within a single clause” (p. 96). Eikewise, the RAND Group 
defined sentences with “dense noun phrases” as sentences with “too many adjectives 
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and adverbs modifying the head noun” (p. 96). Items with either one of the above 
sentenee types were rewritten for clarity. 

When items contained such features, they were modified to reflect language that was more 
readable and comprehensible. 

Sample 

This study took place in the summer of 2007. Eight students participated in the study. All eight 
had recently completed eighth grade. All of the students had diagnosed specific learning dis- 
abilities in the area of reading and were participants in a summer reading enrichment program, 
but did not all have the same home school. All received services for their disability in public 
or private schools and were familiar with large-scale assessments. Because the students paid 
tuition for the private reading enrichment program, it is likely that they were all from middle or 
high socioeconomic status (SES), although we did not collect SES data. None of the students, 
however, reported familiarity with the NAEP assessment of reading (from which sample items 
were derived). 

Among the participants, five students were male and three students were female. One of these 
students was African American, one was Asian American, and one was Hispanic. None was an 
English language learner. 

Procedures 

Eor this study, we followed a standard protocol for each student. Eirst, each student participated 
in a practice activity in order to better understand how think aloud activities worked. Some of 
the students had experience with think aloud activities because their teachers had used them in 
reading and mathematics instruction. None of the students had difficulty with the process. When 
students did struggle, they were reminded to “keep talking” (Ericsson & Simon, 1994). 

Each student read either a literary or informational text passage. Students were randomly as- 
signed to passage types (informational or literary) and “forms” (each form had both standard 
and modified items, but ordering of items was different across forms). The different forms were 
designed so that every other question was standard or modified (e.g.. Item 1 on Eorm A was a 
standard item, item 2 was a modified item, etc. Eor Eorm B, item 1 was a modified item, item 
2 was a standard item, etc.). Thus, different forms had “matched pairs” of items that were writ- 
ten in standard/modified formats. To this end, every student was likely to answer both standard 
and modified items, but each student only read one particular passage type (informational or 
literary). 
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In this study, students read a passage either silently or aloud (students were given the choice of 
how they would like to read). Of the eight participants, only one participant chose to read the 
passage aloud. Once students completed reading the passage, they were asked to answer items 
one at a time while thinking aloud. Students were asked to read all items aloud while answering 
them. The interaction between researcher and student then took place in two phases. 

In phase 1 , the student engaged in a think aloud while the interviewer used only passive prompts 
to encourage the child to think out loud. During this phase, students read each item aloud and 
described their problem solving processes while attempting to correctly answer the item. In 
phase 2, the interviewer asked the child specific questions to probe his or her understanding of 
the child’s cognitive process. Questions that were asked of the students included: 

• How did you get that answer? 

• What makes you believe that answer is the right one? 

• Was there anything that seemed tricky about this question? 

• Was there anything that confused you about this question? 

• Were there any words in this question that you did not know? 

• Could we do anything, change the item in any way, to make it clearer to you? 

• Did you think this passage was easy or hard to read? 

• Were there any words you did not understand? 

• Was any part of it confusing to you? 

• Could you find the answers to the questions in the passage? 

• Other questions related to content of passage or item. 

Researchers praised students throughout the process for thinking out loud and verbalizing, but 
did not provide any information to students about whether particular answers were correct or 
incorrect. The purpose of providing feedback to students was to encourage them to continue 
verbalizing, because this was the primary data source for this study. 

Each think aloud activity took approximately one hour to complete. Students were videotaped 
at two angles to capture student verbalizations plus any activities related to pointing, circling, 
or other physical acts in which students engaged during the think-aloud session. 
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Analysis 



Data were transcribed from videotapes and coded using a standard coding form. This form asked 
reviewers to note the following information: 

• Student identification number 

• Whether the question was standard or modified 

• Student comments relevant to the item 

• Whether the student answered the question correctly or incorrectly 

• If the student answered the question incorrectly, an analysis of her or his error 

• Data supporting conclusions made from the error analysis 

• Any advice the student had for test makers 

Once coding was completed, secondary analyses were conducted to (a) determine the number 
of correct and incorrect answers for individual items, (b) make comparisons of matched pairs 
of standard and modified items, and (c) find qualitative patterns in errors made by students. 

There was no way to test for statistical significance with the small sample of students who par- 
ticipated in this research. Thus, descriptive statistical information was compared with qualitative 
information for each item. 



Results 

Not all students answered all items in this research. This was because students were randomly 
assigned to either informational or literary passages, and further randomly assigned to “stan- 
dard” (odd numbered items that students completed were standard and even numbered items 
were modified) or “modified” (odd numbered items students completed were modified and even 
numbered items were standard) conditions. Furthermore, we stopped all think aloud sessions 
after one hour to prevent student fatigue; thus, some items at the end of protocols were not an- 
swered at all. Each item pair (its standard and modified versions) is reported below, along with 
quantitative information on correct/incorrect responses and qualitative information supplied 
by students. After all items are reported, overall information on the number of correctly and 
incorrectly answered items by type (standard vs. modified) is reported. 
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Overall Results 



Among students who answered both standard and modified questions, only one student answered 
more standard questions correet than modified questions. Overall, students answered only 46% 
of standard items correctly and answered 72% of modified items correctly. Student responses 
ranged from one student answering three of four standard items (75%) correctly and only one 
of three (33%) modified questions correctly to one student answering only one of four standard 
questions (25%) correctly and three out of three modified questions correctly (100%). Although 
sample sizes were small, the general overall effect seemed to be that students achieved better 
results on modified items than on standard items. Table 1 represents overall results for standard 
and modified items. 



Table 1. Overall Results for Standard and Modified Items 



Item Type 


Percent Correct Overall 


Range Percent Correct by Participant 


Standard 


46 


25%-75% 


Modified 


72 


50%-100% 



Item Level Results 

Grade 8, Matched Pair 1 (Literary Passage) 

For the first item, the standard version of the item asked the students to “Use the dictionary 
entry below to answer the question, what meaning of fast is used in line 4?” For this particular 
item, the modified version of the item simply asked “Which meaning of fast is used in line 4?” 
Therefore, the modifying process involved reducing the text load of the item. 

Both the standard and modified version of the item were answered correctly 50% of the time 
(1 out of 2 students answered each version correctly). Two of the four students looked back at 
the passage to examine the vocabulary word in question. One student who looked back at the 
text (and had the modified item) answered correctly while another student (with the standard 
item) struggled with the entire item. Although she looked back, she said she was “confused” and 
wanted to go on to the next item. Students who answered the item correctly (both in modified 
and standard formats) used deduction to determine the correct answer. Students who answered 
the item incorrectly did so because they did not grasp the meaning of “fast” within the context 
of the story. These errors were errors in construct-relevant skills. 

Grade 8, Matched Pair 2 (Literary Passage) 

The standard version of the second item begins with “In lines 10-13, the poet uses a simile ‘his 
brown skin hung in strips/like wallpaper’ to emphasize the fish’s.” The modified version of the 
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phrase asked “In lines 10-13 ‘his brown skin hung in strips/ like ancient wallpaper’ is used to 
emphasize the fish’s.” This modified version reduced the text load and decreased the vocabulary 
demands for the item. All students answered this item correctly in both versions (three correct 
answers, zero incorrect answers for the standard item and one correct answer, zero incorrect 
answers for the modified item). Two students (both with standard items) answered quickly from 
memory. The other two students looked back at the passage to answer correctly. 

Grade 8, Matched Pair 3 (Literary Passage) 

The standard version of the item asked “The image of the fish in lines 43 through 50 develops 
the fish as having a character that is?” The modified version asked “Lines 43-50 describe the fish 
as.” The modified version contained only seven words and eliminated the word “character.” 

In both versions of the item, one student answered the item correctly (50%) and one student 
answered the item incorrectly (50%). One of the students who answered the item incorrectly 
(modified version) eliminated answers she thought were incorrect, but used faulty logic. The 
student who answered the standard item incorrectly did so because he could not read the word 
“character.” In this case, an error analysis demonstrated that the use of the vocabulary “charac- 
ter” may have been the cause for one error. 

Grade 8, Matched Pair 4 (Literary Passage) 

The standard version of item 4 asked “When the poet says ‘Like medals with their ribbons frayed 
and wavering’ (lines 61-62) she is referring to.” The modified version of this item simply asked 
“ ‘Like medals with their ribbons frayed and wavering’ (lines 61 and 62) refers to.” This item 
removed the pronoun “she” and reduced the word count on the item from 18 to 14 words. 

None of the students who answered the standard question answered it correctly (0 of 2 students). 
One out of two students answered the modified version correctly. Error analysis of incorrect 
answers indicates that the two students who answered the standard version incorrectly were 
unable to infer meaning from the quote provided. The student who answered the modified ver- 
sion incorrectly was confused by a vocabulary word in the item. The student who answered 
the question correctly re-read the quote, then used deduction (eliminating perceived incorrect 
answers). It appears as if the elimination of “she” did not make any difference for students. The 
connection between students’ inability to infer meaning in the standard version of the item and 
characteristics of the item are unclear for this item. 

Grade 8, Matched Pair 5 (Literary Passage) 

The standard version of this item asked students to “Reread the lines beginning with ‘I admired’ 
(line 45) and ending with ‘aching jaw’ (line 64). The speaker most admires the fish because she 
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thinks he.” The standard version of this item contained two pronouns (she for the author and he 
for the fish) and 26 words. The modified version of this text contains 12 words and only one pro- 
noun (for the fish). The modified version asks “The speaker most admires the fish (lines 45-64) 
because of his.” The word “most” was also presented in bold print in the modified version. 

The two students who answered the modified question incorrectly did so because they did not 
know one of the terms used in the response choices. One student could not pronounce two of 
the words in the response choices. The one student who answered the modified version of the 
item incorrectly did so because he depended on his prior knowledge rather than rereading the 
text to find the correct answer. 

In this item, it appears as if the vocabulary level of the response choices in the item were ques- 
tionable. It is unknown why the same response choices were less problematic for the modified 
version of the item, but evidence points to a possible compounding effect of challenging items 
and subsequent answer choices. This item also demonstrated the problematic nature of students 
depending on prior knowledge to answer questions. Although activating prior knowledge is an 
important tool in the reading process, if prior knowledge is not combined with information in 
actual text, it may lead students astray. 

Grade 8, Matched Pair 6 (Literary Passage) 

The standard item in matched pair 6 asks students “Which of the following best describes the 
person speaking in the poem?” The modified version of this item begins “Which best describes 
the speaker of the poem?” A second set of revisions in the item also reduces the text load in the 
latter part of the item by one word (making the total word count 12 words for the standard item 
and 8 words for the modified item). 

All students answered this item correctly. Two students depended on memory and two students 
eliminated answer choices they felt were incorrect until they arrived at the correct answer. 
Overall, it appeared as if the reduction in word count (only four words) and the use of bold print 
for the word best did not make a difference for this item as the item was easy for the students 
in both versions. 

Grade 8, Matched Pair 7 (Informational Passage) 

The standard version of this item asked “What did the immigrants dislike most about their trip 
to America?” When modified, the item read “What upset the immigrants most about their trip?” 
For this item, the negative term “dislike” was replaced by the word “upset.” There was clear 
evidence in this item that eliminating a negative term (dislike) changed how students interpreted 
the item. Both students who answered the standard item answered incorrectly while both students 
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who answered the modified item answered correetly. One student answered ineorrectly beeause 
he relied on memory and did not revisit the passage; the other student read the word “dislike” 
as “like.” The latter student elearly demonstrated the effeet of negative words that have prefixes 
such as “dis” or “non” on readers’ comprehension of test items. 

Grade 8, Matched Pair 8 (Informational Passage) 

The standard version of this item was worded “The statement that immigrants had to ‘contend 
with border guards, thieves, and crooked immigration agents’ means that the immigrants?” The 
modified version of this item was re-written for clarity of purpose, “Immigrants had to ‘contend 
with border guards, thieves, and crooked immigration agents.’ This means.” The language for 
the item was simplified and the word “that” was removed twice to present a clearer message 
to students. 

Two students answered the standard version of this item correctly. The student who answered 
incorrectly struggled to find the quote in the passage and claimed that the item was too difficult 
to understand. He suggested the item read “what was the main idea of the story.” It is unknown 
whether the modified version of this item would have helped this student answer correctly, or if 
the item’s requirements were simply too difficult for this student. The one student who answered 
the modified item did so correctly. 

Grade 8, Matched Pair 9 (Informational Passage) 

The standard version of this item asked students to determine “What most worried the immigrants 
about the medical examinations?” The modified version of the story attempted to use language 
for which Grade 8 students may be more familiar, and asked students “What were immigrants 
most afraid of during medical examinations?” 

One out of two students answered the standard item correctly. The student who answered incor- 
rectly looked for the section referred to in the answer choices, but was unable to find those sec- 
tions in the text. It is unknown whether the modified version of this item would have helped this 
student, but his error appeared to be the result of his inability to recall features of the story, not 
an issue related to the item. Both students who answered the modified item did so correctly. 

Grade 8, Matched Pair 10 (Informational Passage) 

The standard version of this item asked students “The United States eventually reduced the 
number of immigrants allowed to enter the country because?” The modified version of this 
item asked “Why did the United States reduce the number of immigrants?” One of two students 
who answered the standard version of this question did so correctly. The student who answered 
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incorrectly did so because he answered based on his own prior knowledge of the issue, which 
tainted his response. Only one student took the modified version of this item and answered in- 
correctly. This student did so because he thought the word “reduce” meant to “make more.” In 
this item, the version students answered did not seem to make a difference. It is clear, however, 
that defining the vocabulary term in the stem of the item (e.g., implying that reduction means 
making less of something) may have changed the outcome of this item. 

Grade 8, Matched Pair 11 (Informational Passage) 

The standard version of this item asked “In the passage, what is the main purpose of the sub- 
headings?” The modified version of this question simply omitted the phrase “In this passage” 
and asked “What is the purpose of the subheadings?” 

Three students took the standard version of this item, but only one answered correctly. The two 
students who answered incorrectly did so because they were unfamiliar with the terminology 
used in the item (subheadings). The one student who answered the standard item correctly and 
the one student who answered the modified item correctly did so because they were familiar 
with the expository convention of subheadings from school. In this item, design was not a fac- 
tor because student success depended on knowledge of the particular convention in the item. 
Reducing the number of words in the item had neither a positive nor negative effect. 

Grade 8, Matched Pair 12 (Informational Passage) 

This item again referred to a convention used in expository text. The standard version of the 
text asked “In the article, how do the quotations by immigrants relate to the sections of the 
article?” The modified version of this item asked students “The author most likely included the 
quotations by immigrants to.” 

Results for this item were similar to the previous item. One student answered the standard ver- 
sion of this item and two students answered the modified version. All students answered the 
item correctly because they were familiar with the convention to which the item referred. In this 
case, modifying the item did not appear necessary because all students were able to find the key 
convention in question and answer based on their prior knowledge of the convention. 

Grade 8, Matched Pair 13 (Informational Passage) 

The final item related to the main idea of the passage. The standard version of this item asked 
students “The main point the author is making in this passage is about the?” The modified ver- 
sion asked “This passage is mostly about the.” 
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Although the reduction in language complexity was hypothesized to make the item more ac- 
cessible, all students who answered this item did so correctly (two students for the standard 
version and one student for the modified version). All students answered this item very quickly, 
indicating that finding the “main idea” or what a passage is “mostly about” is a strategy for 
which students are very familiar. In this case, simplified language was not necessary because 
of the ease with which students approached the construct. 

Summary of Results 

Results from this study indicate that modified items may be more accessible to students with 
learning disabilities. As a whole, students answered modified items correctly at a rate of 72% 
compared to answering standard items correctly at a rate of 46%. The differences indicate an 
overall positive effect on student outcomes when using items that are readable and compre- 
hensible, but individual items provide further information on what specific item modification 
strategies were most meaningful. 

For example, the text load (word count) was reduced for modified items 1-5 and 10-12, but 
reduction in word count did not make a difference in student results. Likewise, the modified 
version of items 5 and 6 contained bold print. Students commented that they liked the bold print, 
but it did not seem to affect results. Errors directly related to design of items, however, were 
present. For example, item 3 contained a vocabulary term that was unknown to the student. 
Likewise, item 7 (standard) had a negative prefix (dis) that was removed in the modified item. 
The negative prefix was the cause of student error. Items 5 and 10 both had item-related errors 
that were not addressed in our modification process, but are instructive in their results. Item 5 
had answer choices that were too difficult for students to read and item 10 had a vocabulary word 
in the item’s stem that was problematic in both the standard and modified versions of the item. 
Both items appeared to be comprehension items, but the vocabulary in the items served as an 
access point. It is unknown whether making changes to the vocabulary level of these items would 
affect the intended construct. Because the items were not vocabulary items, it is possible that 
parenthetical definitions of terms might have helped students, but further research is needed to 
determine the effects of parenthetical definitions on both student outcomes and test validity. 

For most items, the source of student error or success was the particular student’s knowledge 
of the content and story. For items 6, 9, and 1 1 students answered incorrectly because of flawed 
reading strategies. In contrast, there were no errors for either the standard or modified version 
of items 12 and 13 because all students understood the constructs tested. The sources of error 
were difficult to distinguish from student responses to item 8. Table 2 shows how each item was 
modified, and what the likely sources of error were. 
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Table 2. Item Modification Strategies and Student Errors 



Item 


Modification 


Source of Error 


1 


Reduced text load (word count) 


Student reading strategies 


2 


Reduced text load and vocabulary level 


Student reading strategies 


3 


Reduced text load and vocabulary level 


Item stem vocabulary* 


4 


Removed pronouns and reduced vocabulary 


Student reading strategies 


5 


Reduced word count, important words bold 


Vocabulary in answer choices* 


6 


Reduced text load, important words bold 


Student reading strategies 


7 


Replacement of negative word 


Item negative term* 


8 


Rewritten for clarity of purpose 


Error source unknown 


9 


Rewritten with familiar language 


Student reading strategies 


10 


Reduced text load 


Undefined vocabulary in stem* 


11 


Reduced text load 


Student reading strategies 


12 


Reduced text load 


No error 


13 


Reduced text load 


No error 



‘Indicates item-related sources of error. 



Discussion 

As states design tests that are intended to be accessible to the diverse general assessment popula- 
tion (e.g., universally designed assessments), continued research is important for defining how 
test items can become more “readable and comprehensible.” To this end, specific research on 
what strategies are most effective in making test items more accessible can inform practitioners 
on ways to make items accessible to a wide variety of students, including students with learn- 
ing disabilities. 

This study was limited because of its sample size. Only eight participants engaged in think 
aloud activities, so results may not generalize to the wider population. Furthermore, the sample 
was drawn from public and private school students who were actively engaged in reading im- 
provement programs over the summer. Thus, the particular students who participated may not 
represent the heterogeneity of students with learning disabilities in the United States. It is likely 
that the particular sample with which we worked was from a higher socioeconomic level than 
the general population of students with learning disabilities in the U.S. It will be important to 
conduct additional research, both increase sample size and to increase the diversity of students 
within the sample. 
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Despite limitations of this study, informative results emerged. Beeause think aloud methods 
ean provide in-depth information about student problem-solving proeesses for test items, error 
analyses eould be eondueted on all items to determine the likely souree of error. For many items, 
students answered eorreetly or ineorreetly based on reading strategies and their knowledge of 
reading eonventions. Item modifieations sueh as redueing the number of words in an item and 
highlighting bold words did not seem to have any effeet on student results. 

Four items, however, provided information on ways that items ean be made more readable and 
eomprehensible. All of these items related to voeabulary. As noted above, English language 
arts has speeifie voeabulary and terms that are often tested in large seale assessments. In this 
researeh, we found that other words (not eonstruets that were tested) eonfused students. Beeause 
of this, it may be important to eonduet further researeh on the effeets of undefined, non-eonstruet 
voeabulary in item stems; ehallenging non-eonstruet voeabulary in item response ehoiees; and 
words with negative prefixes (sueh as “dis”). 

In this researeh, it was not the number of words that ehallenged students, but the words them- 
selves. As the assessment eommunity eontinues to attempt to understand how to make test 
items as aeeessible as possible without ehanging intended eonstruets, eonsideration of the role 
of voeabulary within the items themselves is important. Findings from this researeh indieate 
that non-eonstruet voeabulary may obseure what knowledge we ean gain about student read- 
ing eomprehension from tests. Sueh findings imply there is a need for both further and more 
eomprehensive researeh on the effeets of voeabulary, and eontinued review by eontent experts 
to ensure the level of non-eonstruet voeabulary in items is appropriate. 
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